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Abstract 

Consider  the  task  of  finding  all  the  eigenvalues  of  a  dense  matrix. 
We  show  how  Hadamard’s  procedure  (1891)  can  be  organized  into 
Aitken’s  if -table  (1925)  and  how  the  if -table  may  be  transformed  into 
Rutishauser’s  gd-array  (1953)  with  the  help  of  the  Lanczos  algorithm. 
We  show  how  the  qd  algorithm  can  be  interpreted  as  defining  the 
LR  algorithm  (1958).  Finally  we  show  how  the  original  qd  algorithm 
may  be  transformed  into  the  shifted  differential  qd  algorithm  dqds 
developed  by  Fernando  and  Parlett  (1993/94).  The  Lanczos  algorithm 
takes  a  dense  matrix  into  tridiagonal  form  and  then  dqds  is  a  fast  and 
accurate  procedure  for  extracting  the  eigenvalues. 


This  paper  evolved  from  talks  given  at  UC,  Berkeley,  in  October  1992,  at 
the  Symposium  on  Scientific  Modelling,  Charleston,  Illinois  and  at  the  Mass. 
Inst,  for  Tech.,  both  in  October  1995. 
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1  Introduction 

The  story  to  be  told  here  is  not  strictly  a  contribution  to  the  history  of 
mathematics.  It  might  be  described,  using  a  fashionable  adjective,  as  virtual 
history.  With  the  benefit  of  hindsight  we  trace  the  transformations  of  an  idea 
from  its  original  pre-computing  form  into  a  powerful  and  elegant  algorithm 
that  is  ideally  suited  for  implementation  on  a  ‘parallel’  computing  system 
that  can  keep  many  processors  busy  at  the  same  time. 

Our  tale  begins  with  the  doctoral  dissertation  of  the  illustrious  French 
mathematician  Jacques  Hadamard  in  1891.  His  solution  to  the  problem, 
described  in  the  next  section,  yields  a  lousy  algorithm.  Our  title  intends  no 
disparagement  of  that  great  man;  neither  he  nor  any  of  his  contemporaries 
would  have  dreamt  of  evaluating  the  determinants  he  so  cleverly  introduced. 
Hadamard  ‘missed’  the  subsequent  evolution  of  his  idea  because  he  saw  no 
need  for  it. 

It  is  exciting,  and  a  little  intimidating  to  realize  how  much  our  notion  of 
the  ‘solution’  to  a  problem  has  changed  in  one  century.  The  story  is  ‘virtual’ 
history  because  the  next  two  investigators,  A.  C.  Aitken  and  H.  Rutishauser, 
were  not  directly  influenced  by  Hadamard’s  dissertation.  However  they  could 
have  been!  Aitken  rediscovered  Hadamard’s  idea  for  himself  (about  1925) 
and,  driven  by  the  existence  of  computing  machines,  though  hand  driven 
and  mechanical,  saw  the  weakness  in  the  formal  solution. 

What  prompted  our  choice  of  title  is  the  intriguing  fact  that  a  certain 
well  known  quadratic  identity  among  Hankel  determinants,  (7)  in  Section  4, 
that  Aitken  used  to  good  effect  was  stated  in  Hadamard’s  dissertation,  see  [8, 
p.  20,  formula  (14)],  and  used  in  the  analysis  but  not  exploited  to  compute 
the  determinants  with  minimal  effort. 

At  this  point  we  acknowledge  our  debt  to  Peter  Henrici  who  began  our 
story  in  [9].  Unfortunately  he  died  before  the  recent  investigations  that 
have  brought  Rutishauser’s  quotient-difference  (qd)  algorithm  back  into  the 
limelight  after  its  eclipse  in  the  1960s  by  the  QR  algorithm. 

After  describing  the  LR  algorithm  and  mentioning  QR  we  present  new 
forms  of  LR  and  qd  that  will  bring  them  back  onto  center  stage  for  both 
sequential  and  parallel  computation. 

These  sections  serve  to  round  out  the  study  we  began  in  [13]. 
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2  The  Task 

Consider  a  square  invertible  complex  matrix  B\  B  G  <CNxN .  The  eigenvalues 
{A,}  are  labelled  in  decreasing  order  by  magnitude 

0<L<|AJV|<|AJV_l|<...<|A1|.  (1) 

The  basic  goal  is  to  compute  B’’ s  spectrum  from  the  entries  of  B. 

In  order  to  invoke  Hadamard’s  thesis  the  basic  goal  is  mapped  into  a 
slightly  more  general  problem.  Given  B  and  any  two  (column)  vectors  u  and 
v  in  one  may  define  a  rational  function  /  by 

f(z)  :=  u*{IN  -  zB)-xv  (2) 

oo 

=  i^i<iAii-1’  (3) 

«'= 0 

where  the  row  vector  u*  is  the  conjugate  transpose  of  u ,  I ^  is  the  identity 
matrix  in  CNxN  ?  and  z  G  C. 

Note  that  (2)  holds  for  all  z  €  C,  2  ^  Ay-1,  j  =  1, . . . ,  N  while  (3)  holds 
only  in  a  neighborhood  of  the  origin.  In  principle  one  may  compute  as  many 
of  the  Taylor  coefficients  as  desired  from  the  formula 

di  =  u*B'v,  i  —  0,1,2,...  (4) 

The  poles  of  /  are  the  {Aj-1}  and,  when  they  are  simple  we  have  the  simplest 
partial  fraction  representation, 

'M  =  t  T^V  <5> 

J=1  3 

To  avoid  distracting  technical  complications  we  assume  throughout  that  the 
eigenvalues  are  simple. 

Now  we  can  formulate  Hadamard’s  thesis  problem.  Given  the  Taylor 
coefficients  (a0,  a1}  a2, . . .)  of  a  meromorphic  function  /  find  all  its  poles. 
The  generalization  from  rational  /  to  meromorphic  /  requires  no  new  ideas 
and  we  are  content  to  restrict  attention  to  the  form  (2).  In  1884  Konig,  see 
[10],  in  Germany,  had  shown  how  to  find  Ai,  when  |Aj|  >  |A2|,  namely 


®n+l 


OO. 
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Of  course  (A^-1 1  is  the  radius  of  convergence  of  (3)  but,  by  analytic  continu¬ 
ation,  the  sequence  (a,)  determines  /  everywhere  except  at  its  poles. 

Six  years  after  Konig’s  result  was  published  in  1884  there  was  still  no 
formula  for  extracting  the  larger  poles  from  (a,)  and  this  was  the  task  that 
Hadamard  selected. 

Even  a  century  later  students  in  a  complex  variable  course  complain  if 
asked  how  to  compute  Aj1.  The  clue  is  to  define  g(z)  =  (1  —  A iz)f(z),  find 
the  Taylor  coefficients  of  g  and  apply  Konig’s  result. 


3  Hadamard’s  Solution 

By  extending  Bernoulli’s  method  for  approximating  zeros  of  a  polynomial 
using  recurrences  Hadamard  found  the  key  that  unlocked  the  door  that  leads 
to  the  larger  poles  of  /,  namely  certain  Hankel  determinants  built  out  of  the 
coefficients  (a,).  Define,  for  nonnegative  integers  k  and  ?z, 


tin 

G"n+k— 1 

m 

:=  det 

1 

^n+2 

Qn+k 

Q'n+k—1 

. 

‘  *  <3n+2Ar~2 

HS 

:=  1,  Hf:-an. 

Please  note  the  different  roles  of  k  and  n. 

He  then  expanded  (5)  in  powers  of  z  in  order  to  obtain  expressions  for 
each  a,-  in  terms  of  the  {Aj}  and  then  substituted  the  new  expressions  in  an 
expansion  of  (6).  This  yields 

Theorem  1  As  n  — ►  oo.  with  f  of  the  form  (5), 


Hf  =  constant  •  (Ai  •  •  •  A*.)” 


1  +  0 


The  original  proof  is  in  [8]  but  an  easier  one  is  given  in  [9]. 


Corollary  1  For  large  enough  n,  Hf  ^  0. 
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Corollary  2  For  suitable  k  (i.  t.  |A^|  > 

»  A1A2  *  *  *  Afc,  n  — >  00. 


^+1 


Corollary  3  //  |Ajt_1|  >  |A*|  >  |A*+1|  then 

H£+1  EU 


Qk 


Ajt,  n  — >■  oo. 


Corollary  4  If  f  has  exactly  N  poles  (counting  multiplicities),  then 


Hn+i  =  all  n. 

We  mention,  in  passing,  that  when  | A*,—!  |  =  | A* |  =  |Afc+1|  the  situation  is  not 
hopeless,  merely  more  complicated.  Certain  combinations  of  neighboring  q £ 
converge  to  the  coefficients  of  a  polynomial  whose  zeros  are  the  eigenvalues 
with  modulus  |A*|.  If  any  A/,  is  multiple  Theorem  1  needs  to  be  modified,  see 

[7]- 

Hadamard  left  the  problem  at  this  point  and  the  remarks  we  make  below 
are  in  no  sense,  to  be  construed  as  criticism.  In  the  mathematics  of  his  day 
there  was  no  question  of  implementing  the  process  implied  by  Corollary  3. 
He  had  shown  exactly  how  the  poles  are  determined  by  the  coefficients  of  the 
Taylor  series,  a  problem  of  some  subtlety  whose  solution  had  eluded  earlier 
researchers  such  as  Konig. 

The  fact  remains  that  Hadamard’s  formulae  are  useless  for  computation 
in  finite  precision  arithmetic.  It  is  not  simply  the  labor  required  to  compute 
many  determinants.  In  general  the  poles  (Aj)-1,  j  >  1,  are  very  sensitive 
to  small  changes  in  the  coefficients  ( an ).  In  other  words  one  must  know 
the  ( an )  correct  to  many  decimals  in  order  to  determine  some  of  the  larger 
poles  to  just  one  or  two  decimals.  Incidentally  there  is  no  need  to  evaluate 
a  determinant  from  its  definition  as  a  signed  sum  of  products  of  entries.  By 
computing  a  triangular  factorization  any  k  x  Ic  determinant  may  be  found  in 
approximately  k3/ 3  scalar  multiplications. 

The  rest  of  the  story  shows  how  both  the  (Hjl)  and  the  ( an )  may  be 
replaced  by  other  quantities  that  determine  the  spectrum  of  B  in  a  less 
sensitive  manner. 
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4  Aitken’s  Scheme 

During  the  1920s  Aitken  worked  on  the  problem  of  finding  all  the  zeros  of  a 
polynomial  by  generalizing  Bernoulli’s  method  ,  see  [1],  and  was  led  to  study 
the  Hankel  determinants.  He  discovered  a  relation  which  allows  the  recursive 
calculation  of  the  H £  from  the  a,-.  This  is  most  easily  shown  by  arranging 
the  in  the  following  table: 

1 

1  H° 

1  HI  H° 

1  HI  H\  H°  (H  -  Table) 

1  Hf  HI  H\  H°4 

1  H\  Hi  Hi  H]  H° 

Note  that  H ”  =  an  and  so  the  first  two  columns  are  known. 

In  the  case  under  consideration  f(z )  has  only  N  poles  and  so  the  ff-table 
has  only  N  +  1  columns. 

The  ratios  of  successive  elements  in  the  kth  column  of  the  77-table  con¬ 
verge  to  the  product  of  the  largest  k  eigenvalues  by  Corollary  2,  provided  that 
|Ajt|  >  |Afc+1|.  By  Corollary  1  the  77-table  ultimately  exists,  i.  e.  H£  ^  0, 
k  <  TV,  n  large  enough.  However  it  is  possible  for  an  early  term  to  vanish 
and  this  would  prevent  the  recursive  computation  of  the  table  from  previous 
terms.  This  possibility  will  be  ignored  here. 

The  relation  which  Aitken  found  for  himself  [2],  but  which  had  been 
known  to  Hadamard,  is 

m  ?  -  «r'  +  fij;,1  n;:i  =  o.  (7) 

This  is  best  remembered  by  taking  any  H%  as  O  and  labelling  its  nearest 
neighbors  in  the  if-table  topographically  as  N ,  S ,  W,  and  E.  With  such  a 
stencil  (7)  can  be  written  as 


O2  =  N  S  -W  E.  (8) 

This  formula  can  be  used  in  two  different  ways: 

(a)  Start  with  the  first  two  columns  and  work  from  left  to  right  in  the  H- 
table,  adding  one  column  at  a  time  by  repeated  use  of  E  =  (NS  —  02)/W . 


6 


(b)  If  the  first  two  diagonals  are  known  one  can  compute  the  lower  diagonals 
successively  working  along  the  new  diagonal  from  left  to  right,  using  S  = 
(O2  —  W  E)/N. 

Now  (b)  has  the  difficulty  of  finding  the  first  two  diagonals  and,  although 
he  showed  how  to  do  this  for  a  polynomial  whose  coefficients  are  given,  Aitken 
in  [2]  favored  method  (a). 

For  the  computation  of  eigenvalues  the  generation  of  the  H- table  by  (a)  or 
(b),  though  feasible,  still  leaves  a  lot  to  be  desired.  With  (a)  it  is  necessary  to 
compute  an  =  H™  —  y*Anx  for  as  many  n  as  necessary  to  obtain  convergence 
of  i/1n+1  /#”  to  the  desired  degree  of  accuracy.  With  (b)  there  is  the  necessity 
of  computing  a,  for  i  =  0, 1, ... ,  21V  — 2  and  then  the  determinants  H °  and  Hi 
for  k  =  1, 2, . . . ,  N.  If  Gaussian  elimination  were  used  then,  whilst  calculating 
H°,  and  H\t,  all  other  Hf  and  Hj  ( i  =  1, 2, ...  N  —  1),  being  leading  principal 
minors,  could  be  found.  This  technique,  however,  would  not  permit  the  use 
of  interchanges  in  the  Gaussian  elimination  and  so  extra  precision  would 
be  required  to  ensure  adequate  accuracy.  (In  all,  21V 3  multiplications  are 
needed  for  the  a,-  and  |1V3  double  precision  multiplications  for  the  Hf  and 
Hf.)  Thus  both  (a)  and  (b)  demand  a  formidable  amount  of  computation 
before  the  remainder  of  the  //-table  can  be  found. 

Since  Hadamard  knew  the  quadratic  relation  (7)  he  could  easily  have 
anticipated  all  of  Aitken’s  results  but,  as  we  have  stressed,  the  motivation 
was  lacking. 

It  was  Rutishauser’s  achievement  to  produce  a  related  table  which  is 
numerically  preferable. 

5  The  qd  Algorithm 

What  changes  could  be  made  in  the  iif-scheme  which  might  reduce  the 
amount  of  initial  computation?  There  is  one  natural  observation:  the  H% 
themselves  are  of  less  interest  than  the 

i 

since  for  |Aj  |  >  | A2 1  >  . . .  |  Ayv  U  as  n  — >  00 

ql  —  A,. 
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As  Henrici  says  in  [9]:  ‘It  is  remarkable  that  in  the  computation  of  the 
the  determinants  H£  do  not  have  to  be  used  if  a  set  of  auxiliary  quanti¬ 
ties  is  introduced.’  Their  introduction  is  presented  as  a  ‘fait  accompli’  by 
Rutishauser  in  [14].  Certainly  this  set  of  quantities  arises  naturally  from 
the  continued  fraction  expansion  of  f(z )  (see  [9,  p.  36])  but  this  author  feels 
that  a  motivation  for  the  choice  of  the  auxiliary  quantities,  called  e£,  can 
be  made  in  the  present  context.  This  is  our  ‘virtual’  history.  We  show  how 
Rutishauser  might  have  derived  qd  from  Aitken’s  if-table. 

The  key  is  to  recast  the  basic  recursion  relation  (8)  so  that  it  shows 
explicitly  the  connection  between  ql  and  q%+i.  To  do  this  clearly,  consider 
a  typical  portion  of  the  iif-table  obtained  by  placing  the  following  stencil  on 
the  table  so  that  D  is  on  H£. 


■  A  ■ 

B  C  D  • 

•  E  F  G 

■  ■  H  I 


Then 


AF 
CD ’ 


<£+1  = 


CH 

EF' 


To  include  all  these  elements  two  applications  of  (8)  are  needed,  centered  at 
C  and  F  respectively,  namely 


C2  =  AE-  BD ,  F2  =  DH  -  EG. 


Now  turn  first  terms  on  the  right  hand  sides  into  ql  and  g£+1  respectively, 
obtaining 

CF  _  AF  BF  CF  CH  CG 

DE  ~  CD  CE  DE  ~  EF  DF‘  ^ 

Reference  to  the  stencil  shows  that  BF/CE  and  CG/DF  are  of  the  same 
form  and  naturally  ask  to  be  named.  If  CG/DF  (=  H^H^+1/HJ/H^+1)  is 
called  e£  then  BF/CE  will  be  Transposing  the  second  terms  on  the 

right  hand  sides  of  (9)  gives  the  following  simple  relation  between  the  q  and 
the  e, 


nn  -4-  pn  —  nn+1  4-  <=n+1 
<lk  +  ek  —  %  +  ek-\ ■ 


(10) 
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The  quadratic  relation  (8)  has  become  a  linear  relation  (10). 

Another  relation  between  the  q  and  the  e  is  required  to  determine  all  the 
g”  and  e£  from  the  9”  (=  an+\/an).  It  follows  directly  from  the  definitions 
of  the  q  and  e.  A  simple  derivation  comes  from  observing  that 

CH  El  _CI  _  DI  CG 
EF  '  FH  ~  F2  ~  GH  '  DF' 

In  other  words 

*"+1  ■  e;+1  =  (11) 


The  qd  Scheme 

Formula  (10)  shows  that  e£  is  the  difference  of  the  quotients  q%+1  and  qjt 
modified  by  This  indicates  a  little  why  Rutishauser  named  the  following 

triangular  table  of  <?’ s  and  e’s  the  Quotient-Difference  Scheme.  Typical  cases 
of  (10)  and  (11)  are  shown  pictorially  in  Figure  1  and  explain  why  E.  Stiefel 
named  them  the  rhombus  rules.  (10)  yields  rhombi  centered  on  ^-columns, 
(11)  yields  rhombi  centered  on  e-columns. 

By  alternate  application  of  the  rules  the  table  may  be  generated,  in  theory, 
form  either 

(a)  the  first  q- column  moving  right,  or 

(b)  the  first  diagonal  moving  down. 

Now,  however,  method  (a)  is  quite  impractical  since  rule  (10)  shows  that 

e?  =  ?,”+'  -  «.”• 

For  | A!  |  >  | A2 1 ,  <7”  — >  Ai,  as  n  — »  00.  As  n  increases  e”  becomes  the 
difference  of  two  almost  equal  numbers  and,  for  computation  with  a  fixed 
number  of  digits  for  each  number,  it  will  have  fewer  and  fewer  significant 
figures.  Thus  (e"+1/e”)  and  hence  q%  will  have  little  or  no  accuracy.  Brieffy, 
one  says  that  method  (a)  is  ‘unstable’.  Method  (b),  on  the  other  hand,  uses 
(10)  in  the  form 

%  ~  Vk  T  ek+1  - 

which  is  ultimately  completely  safe  when  e£+1  and  — >  0. 

The  danger  in  method  (b)  is  that  in  the  early  stages,  for  small  n,  large 
values  of  may  occur  which  will  cause  loss  of  significant  digits  in  the  g”+1. 
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e 


(o) 

1 


e 


(3) 

2 


Figure  1:  The  qd  table 


However  with  the  qd  scheme  there  is  no  alternative  but  to  use  (b),  the  so 
called  progressive  method.  Rutishauser  in  [16]  has  analyzed  the  effect  of  one 
small  g£+1  and  the  resulting  large  value  of  e£+1  (in  using  (11)).  He  observed 
that  only  negative  powers  of  e£+1  occur  in  diagonal  n  +  3  and  below.  Thus 
the  disturbance  affects  only  the  next  two  diagonals  and  he  proposed  some 
auxiliary  modifications  to  be  made  in  calculating  some  of  the  elements  in 
diagonals  n  +  1  and  n  +  2.  For  diagonal  n  +  3  and  below  the  straightforward 
application  of  the  rhombus  rules  is  again  sufficient.  The  recently  discovered 
differential  form  of  qd ,  see  Section  8,  goes  a  long  way  towards  avoiding  the 
need  for  these  modifications. 


6  Calculation  of  the  First  Diagonal 

At  this  point  it  is  worth  observing  that 

n  _  Hl+l  _  U*Bn+1V 

9l  ~  H?  ~  u*Bnv 

and  so  the  descent  of  column  1  of  the  qd  scheme  corresponds  to  the  well 
known  power  method  [17]. 

How  can  the  first  diagonal  of  the  qd  scheme  be  found  indirectly?  The 
answer  is,  surprisingly,  by  use  of  Lanczos  method  of  ‘minimized  iterations’, 
see  [11],  with  initial  vectors  u  and  v  (from  the  definition  of  f(z)  in  Section  2). 
Thus,  as  Henrici  points  out,  the  qd  scheme  links  the  power  method  to  the 
method  of  Lanczos. 

The  reader  is  referred  to  [11]  and  [9]  for  proofs  and  details  of  the  algo¬ 
rithm.  It  produces  a  tridiagonal  matrix  J  similar  to  B  and  of  the  form  (for 
N  =  5) 

Cti  1 

Pi  <*2  1 

J  —  @2  a3  1 

Pz  a4  1 

Pa  <*5 

By  the  LDU  theorem  [3,  p.  20],  J  (or  J  +  si)  may  be  written  as  the  product 
LR 


■  1 

’?i  1 

ea  1 

$2  1 

e2  1 

,  R  = 

93  1 

e3  1 

q4  l 

e4  1 

95 

Is  is  easily  verified  that  for  k  =  1, 2, ...  N, 

Oik  —  Qk  +  Ok-i,  Oo  —  Oy 
Pk  =  qk^k,  =  0. 


Henrici  shows  [9,  p.  33]  that  these  qk  and  ek  are  the  elements  of  the  first 
diagonal  of  the  qd  scheme.  In  fact  he  shows  that  the  nth  diagonal  can  be 
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obtained  in  the  same  way  by  using  Lanczos  method  with  initial  vectors  Bnu 
and  v. 


a 


n 

1 


u*Bn+1v 

u*Bnv 


as  it  should. 

When  |Ai|  >  | A2 1  >  ...|Ajv|  then  q%  — +  A&,  e£  — *  0,  as  n  — *  00. 
When  there  are  eigenvalues  with  the  same  modulus  then  the  corresponding 
q  and  e  columns  may  not  converge.  Fortunately  this  does  not  impair  the 
usefulness  of  the  algorithm.  Discussion  of  what  to  do  in  this  case  is  most 
simply  presented  in  connection  with  the  LR  transformation  and  will  appear 
in  the  next  section. 


7  The  LR  Transformation 


In  Section  6  it  was  shown  that  the  elements  q°  and  e°  of  the  first  diagonal  of 
the  qd  scheme  are  obtained  from  the  LR  decomposition  of  a  certain  tridiag¬ 
onal  matrix  J,  now  to  be  called  Jo,  'whose  nonzero  elements  in  column  i  are 
1,  this  implies  t] 


(12) 


,  for  k  = 

1,2,.. 

■,N, 

eU  = 

aL 

e°  =  0, 

II 

O  ^ 
QJ 

O  -V 

Cn 

PL 

II 

O 

The  rhombus  rules,  (10)  and  (10),  which  yield  the  next  diagonal  have  a  very 
interesting  interpretation.  Recall  from  the  previous  section  that 


1 

'  Qi  1 

e°  1 

q°2  1 

Lq  — 

1 

,  Ro  — 

q°s  1 

e°  1  . 

<75°. 

From  (10)  and  (11)  and  the  rules  of  matrix  multiplication  follow  the  result 


q\  +  e\- 1  —  9°  +  e°k  —  (RoL0)kk  =  cx\,  (say), 


q\e\  =  Q°k+iek  =  (RoLo)k,k+i  =  PL  (say). 


(13) 
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It  is  easily  verified  that  RqLq  is  a  tridiagonal  matrix  of  the  same  form  as 
Jo  and  may  be  called  J\.  Now  comparison  of  (13)  with  (12)  shows  that  the 
second  diagonal  of  the  qd  scheme  (gf,ej)  contains  precisely  the  elements  of 
the  LR  decomposition  of  RqLq  (=  Ji).  Thus 

Jo  is  factorized  into  LoRo, 

The  factors  are  multiplied  in  reverse  order  to  give  RqL0  =  Ji, 

Ji  is  factorized  into  L\R\. 

Similarly  the  third  diagonal  of  the  qd  scheme  contains  the  elements  of  the 
LR  decomposition  of  R\L\  (=  J2),  and  so  on  for  all  the  diagonals.  In  fact, 
for  each  positive  integer  n,  equations  (12),  with  superscript  n,  define  the 
elements  a",  of  a  tridiagonal  matrix  Jn.  The  Jn  may  be  read,  as  it  were, 
‘between  the  (diagonal)  lines’  of  the  qd  scheme.  Clearly  Jn+1  =  J“1JnJn 
and  if  the  eigenvalues  A,  of  Jo  (or  the  original  A)  have  distinct  moduli  then, 
n  — ►  00, 

Jn  — >  Roc  (a  triangular  matrix  with  q =  A*). 

Thus  the  qd  scheme  implicitly  defines  a  convergent  sequence  of  tridiagonal 
matrices  similar  to  Jo.  this  was  a  brilliant  insight  by  Rutishauser. 

For  tridiagonal  matrices  which  are  symmetric  positive  definite  the  qd 
algorithm  may  be  used  since  the  danger  (instability)  mentioned  in  Section  4 
cannot  occur.  This  technique  is  from  three  to  six  times  as  fast  as  the  much 
used  Sturm  sequence  method,  see  [6],  See  Rutishauser  [16]  for  full  details. 

Dense  Matrices 

This  interpretation  of  the  qd  scheme  suggests  a  similar  algorithm  for  full 
nonsingular  matrix  B\.  By  the  LDU  theorem,  [3,  p.  20],  B\  (or  B\  +  sl)  may 
be  factorized  into  the  product  of  left  triangular  matrix  L\  with  l’s  on  the 
diagonal  and  right  triangular  matrix  R1.  The  LR  transformation  is  defined 
by  Rutishauser  in  [15]  as  follows.  For  k  =  1,2, .. . 

f  Factorize  Bk  into  LkRk, 

\  Form  RkLk  and  call  it  Bk+i  ■ 

Clearly  Bk+i  =  L^BkLk  and  it  may  be  asked  when  this  sequence  {Bk} 
converges  to  a  triangular  matrix. 

The  answer,  see  [15],  is  contained  in  the  following 


13 


Theorem  2  If  By  =  U  diag( A1?  •  •  • ,  \N)U~l  and 

(a)  |Ai|  >  |A2|  >  •••  >  |A/v|, 

(b)  The  leading  principal  minors  of  U  and  U _1  are  nonzero,  then  \im  Bk  = 
exists  and  is  upper  triangular  with  (Boo) a  =  A,-. 

If  the  |  Ai|  are  not  distinct  then,  loosely  speaking,  Bk  may  be  said  to  converge 
to  block  triangular  form.  More  precisely  suppose  that  |Ai|  =  |A2|  =  •••  = 
|AP|  >  |Ap+i|.  Strictly  speaking  B^  may  not  exist  but,  k  — ►  oo,  Bk  becomes 
reduced.  The  elements  in  the  first  p  columns  may  not  converge  but  the 
characteristic  polynomial  of  the  leading  principal  submatrix  of  order  p  does 
converge  to  the  monic  polynomial  with  roots  Al5 . . . ,  Xp.  The  complementary 
submatrix  has  eigenvalues  which  converge  to  Ap+1, . . . ,  XN.  If  any  of  these 
eigenvalues  have  equal  modulus  then  this  submatrix  will  become  reduced 
also.  Thus  as  k  — ►  oo  the  submatrix  blocks  which  become  isolated  along 
the  diagonal  correspond  to  groups  of  eigenvalues  of  equal  modulus.  In  the 
qd  scheme,  analogously,  when  successive  q  and  e  columns  fail  to  converge, 
it  turns  out  that  certain  polynomials,  formed  from  the  qs  and  e’s  in  those 
columns  and  in  the  nth  diagonal,  do  converge  as  n  — ►  oo. 

In  the  important  case  of  real  matrices  with  complex  conjugate  pairs  of 
eigenvalues,  Bk  may  be  expected  to  have  along  the  diagonal,  for  large  k, 
isolated  2x2  real  submatrices  which  yield  the  eigenvalues  very  conveniently. 
This  was  the  reason  for  stating  in  the  previous  section  that  the  occurrence 
of  eigenvalues  of  equal  modulus  does  not  impair  the  qd  algorithm. 

Shifts  of  origin  may  be  used  with  both  the  qd  algorithm  and  the  LR 
algorithm  to  hasten  convergence.  Thus  for  scalars  aj  one  defines 

Bj  c  j  I  —  L  j  Ftj , 

Bj+ 1  :=  RjLj  +  ajI. 

A  good  strategy  for  choosing  {crj}  is  important  in  practice  but  that  is  not 
the  focus  of  this  paper.  See  [15]  and  [4]  for  more  on  shifts. 

One  might  think  of  the  LR  algorithm  as  the  final  transformation  of 
Hadamard’s  scheme  into  a  practical  algorithm.  However  the  search  is  not 
over  yet. 

The  possible  element  growth  that  can  occur  in  LR  prompted  the  search 
for  more  stable  schemes.  About  1960  J.  G.  F.  Francis  presented  his  QR 
algorithm  applied  to  matrices  in  Hessenberg  form  and  this  has  been  the 


method  of  choice  for  computing  eigenvalues  for  over  30  years.  It  does  not 
preserve  bandwidth  for  nonsymmetric  matrices. 


8  Differential  qd 

The  two  rhombus  rules  that  govern  the  qd- array  are  invoked  in  alternating 
fashion  to  compute  the  next  line  of  the  array.  Shifts  a  may  be  introduced 
and  the  result  is  Rutishauser’s  progressive  qd  algorithm.  This  may  be  written 
symbolically  as 

qds  :  (g,e)  (<?,e). 

The  code  is 


qi  =  q\  +  ei  -  a 
for  j  =  1,2,  1 

ej  =  ej  *  (qj+i/qj) 

Qj+ 1  —  Qj+ 1  +  ej+ 1  ~  —  <7 


Here  =  0  and  e n  =  0. 

In  terms  of  the  LR  algorithm  of  the  previous  section,  qds  yields  the  fac¬ 
torization 

UL-al  =  LU.  (14) 

Now  we  look  at  this  triangular  factorization  again,  in  closer  detail,  to  derive 
a  differential  algorithm  that  computes  the  same  quantities  with  slightly  more 
arithmetic  effort  than  qds. 

Look  at  the  reduction  of  UL  —  al  to  U  at  an  intermediate  step.  Thus  the 
array  is 

0,  qu- i,  1 

0,  1 

£kqk+ 1>  qk+1  +  Cfc+l  —  <7,  1 

tk+iqk+2,  *,  1 

The  key  is  to  write  the  new  pivot  %  as  a  sum  of  <4  +  e*,  introducing  a  new 
arraj^  d.  The  next  step  in  elimination  is 


15 


ZkQk+'l  / ([ki 

<4+i  +  ek+1  =  qk+1  +  ek+1  -  a  -  qk+1ek/(dk  +  ek). 

The  beautiful  feature  is  that  e^t+i  is  removed  from  each  side  analytically 
without  even  being  added  to  qk+\.  Simplifying  each  side  yields 

dk+i  =  dk(qk+i/qk)  —  a. 

This  gives  the  dqds  algorithm  (differential  qd  with  shifts)  introduced  by  Fer¬ 
nando  and  Parlett  [4]. 

dqds  di  =  q1  —  a 

for  k  =  1 , 2, . . . ,  N  —  1 
qk  =  dk  +  ek 
h  =  ek(qk+1/qk) 

<4+i  =  dk(qk+1/qk )  -  a 

qN  =  <4*'  • 

Because  of  the  repeated  expression  (qk+i/qk)  the  algorithm  may  be  imple¬ 
mented  with  only  one  division  in  the  inner  loop. 

Should  qk  =  0  the  algorithm  will  break  down.  Nevertheless  in  the  ab¬ 
sence  of  such  a  breakdown,  the  algorithm  enjoys  a  remarkable  mixed  stability 
property.  When  executed  in  finite  precision  it  is  only  necessary  to  change 
appropriately  the  input  (q,e)  and  the  output  (q,e)  by  1,  2,  or  3  units  in  the 
last  place  of  each  variable  in  order  to  have  an  exact  dqds  transformation.  See 
[4]  for  the  full  story. 

One  attractive  consequence  is  that  when  q,e,q  and  e  are  positive  then 
the  eigenvalues  of  UL  and  U L  agree  to  high  accuracy,  however  small  some 
of  those  eigenvalues  may  be. 

The  additional  variables  {dj}  seemed  like  a  blemish  at  first  but  it  turns 
out  that  they  yield  useful  approximations  to  the  smallest  eigenvalue  and  so 
improve  shift  selection. 


9  The  Implicit  LR  Algorithm 

The  LR  algorithm  introduced  by  Rutishauser  produces  a  sequence  of  matrices 
on  the  assumption  that  triangular  factorization  is  permitted 


16 


at  each  step.  Thus  Bj  =  LjUj.  For  several  reasons  it  would  be  preferable  to 
discard  the  Bj  in  favor  of  the  pairs  Lj,  Uj.  In  order  to  do  that  it  is  necessary 
to  derive  LJ+i,  UJ+\  from  Lj,  Uj  without  forming  BJ+i  :=  UjLj  explicitly.  To 
avoid  subscripts  we  write 

LU  =  B  :=  UL.  (15) 

Appeal  to  the  two-sided  Gram-Schmidt  process  shows  that  (15)  implies  the 
existence  of  unique  matrices  F,  G  such  that 

L  =  GU,  U  =  LF FG  =  I.  (16) 

The  practical  point  is  that  when  L  and  U  are  bidiagonal  then  F  and  G  may 
be  generated  implicitly  as  the  product  of  very  simple  matrices.  Even  when 
L  and  U  are  dense  the  F  and  G  matrices  may  be  generated  as  products  of 
elementary  matrices  of  the  form  (I  —  xy t). 

Shifts  may  be  introduced  to  accelerate  convergence  and  the  new  equations 
are 

L  —  GU,  U  -  aL-1  =  LF,  FG  =  /.  (17) 

It  turns  out  that  the  presence  of  L~l  does  not  greatly  complicate  the  repre¬ 
sentation  of  F  and  G.  For  more  information  see  [5]. 

This  algorithm  is  attractive  when  B  has  narrow  bandwidth  and  one  can 
avoid  reducing  to  tridiagonal  form. 


10  Parallel  Features 

The  simplicity  of  the  dqds  algorithm  facilitates  its  adaptation  to  modern 
computer  architectures,  particularly  to  systems  with  many  arithmetic  units. 
The  algorithm  sketched  below  is  presented  for  its  intellectual  interest.  It  is 
very  fast  but  is  also  unstable,  [12].  Nevertheless  the  idea  is  worth  knowing. 

Consider  the  dqds  algorithm  again  and  observe  that  if  the  dj,  j  =  1, . . . ,  N 
were  all  known  then  the  new  q  and  e  values  may  be  obtained  in  fully  parallel 
form  with  each  processor  responsible  for  a  given  subset  of  array  elements. 
Let  v  T  denote  the  array  obtained  by  pushing  up,  by  one  index,  the  entries 
of  v  and  inserting  0  for  the  last  position.  As  usual  e(./V)  =  e(iV)  =  0. 

Parallel  dqds:  q  —  d  +  e, 

e  =  e*((q  1)  +  q). 
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The  hard  task  is  to  compute  d  fast.  This  may  be  done  in  21og2  N  time  despite 
the  fact  that  he  recurrence  seems  intrinsically  nonlinear  and  sequential; 


dk+ 1  —  Qk+ldk / (dli  “f"  &k)  <J. 


(18) 


The  form  changes  radically  when  dk  is  written  as  a  quotient  dk  =  fk/gk- 
Then  (18)  becomes 

f k+l  fki^Qk-k  1  7*)  & &k9k 

9k+ 1  fk  +  Ckgk 

or,  in  vector  notation, 


fk+l 

9k+ 1 


9k+ 1  — 

1, 


-crek 

ek 


=  wk 


fk 

9k 


WkWk-i  •  •  •  it7!  ^ : 


(19) 


for  k  =  1, . . . ,  JV  —  1.  Note  that  all  Wj  are  known  in  advance,  from  q,  e  and 
cr,  and  we  want  all  the  partial  products. 

With  N,  or  N/2,  processors  this  task  can  be  accomplished  in  only  21og2  N 
time!  The  technique  is  known  as  ‘parallel  prefix’  among  computer  scientists. 
When  a  =  0  the  Wj  are  lower  triangular  and  the  products  are  simplified. 

The  idea  of  the  algorithm  is  best  understood  by  a  diagram,  see  Figure  2. 
On  the  other  hand  it  can  be  specified  in  a  very  short  procedure.  Let  m  = 
log2  N  and  initialize  an  array  Y  to  Yj  —  Wj ,  j  =  l,...,jV.  Upon  exit  Yk 
holds  Wk Wk-i  ■■■W1. 

In  pseudo  code  we  need  only  two  lines. 


for  j  =  1°,  22, . . . ,  2m"\  [  for  i  =  2 j,  N,  2 j  in  parallel  Y{i)  =  Y{i)  *  Y(i  -  j )  ] 

for  j  =  2m-2, 2m-3, . . . ,  2°,  [  for  *  =  3 j,  N,  2 j  in  parallel  Y(i)  =  Y(i)  *  Y(i  -  j )  ] 

The  third  variable,  2 j  in  the  for  i  expression  is  called  the  stride;  i  = 
2j,4j,  67, . . . ,  N.  The  first  line  of  the  algorithm  is  called  going  down  the 
tree  and  the  second  line  is  called  going  up  the  tree. 
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1  1-2 


1  1-2  1-3 


1-8 


1-7  1-8 


Figure  2:  Parallel  prefix 


11  Conclusion 

In  the  1920s  A.  C.  Aitken  rediscovered  much  of  the  material  in  Hadamard’s 
thesis  and  in  addition,  worried  about  speed  versus  accuracy  when  computing 
the  #-table.  We  showed  here  (Section  5)  that  Rutishauser’s  gd-table  can 
be  seen  as  a  reformulation  of  the  H -table  to  give  a  more  robust  array.  In 
fact  Rutishauser’s  motivation  came  from  continued  fractions  and  not  from 
Aitken’s  work. 

It  is  quite  well  known  that  the  ^d-algorithm  enabled  Rutishauser  to  dis¬ 
cover  the  LR  algorithm.  For  tridiagonals  LR  may  be  viewed  as  a  reformu¬ 
lation  of  qd  and  thus  as  a  descendent  of  the  if-table.  Instead  of  focusing 
on  QR  as  a  stable  variation  of  LR,  as  in  [13],  we  have  chosen  to  present 
recent  variants  on  qd  and  LR  which  seem  preferable  to  the  original  ones. 
The  new  techniques  preserve  bandwidth  and  can  afford  to  have  a  much  more 
sophisticated  shift  strategy  than  QR.  It  remains  to  be  seen  whether  they  will 
displace  QR.  In  any  case  they  may  be  perceived  as  computationally  desirable 
descendants  of  Hadamard’s  if-table. 
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